A Fast K-prototypes Algorithm Using Partial Distance Computation

نویسنده

  • Byoungwook Kim
چکیده

The k-means is one of the most popular and widely used clustering algorithm, however, it is limited to only numeric data. The k-prototypes algorithm is one of the famous algorithms for dealing with both numeric and categorical data. However, there have been no studies to accelerate k-prototypes algorithm. In this paper, we propose a new fast k-prototypes algorithm that gives the same answer as original k-prototypes. The proposed algorithm avoids distance computations using partial distance computation. Our k-prototypes algorithm finds minimum distance without distance computations of all attributes between an object and a cluster center, which allows it to reduce time complexity. A partial distance computation uses a fact that a value of the maximum difference between two categorical attributes is 1 during distance computations. If data objects have m categorical attributes, maximum difference of categorical attributes between an object and a cluster center is m. Our algorithm first computes distance with only numeric attributes. If a difference of the minimum distance and the second smallest with numeric attributes is higher than m, we can find minimum distance between an object and a cluster center without distance computations of categorical attributes. The experimental shows proposed k-prototypes algorithm improves computational performance than original k-prototypes algorithm in our dataset.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Effective Montogomery Algorithm Using Multiplier Circuits

Modular exponentiation is the cornerstone computation in public key cryptography systems such as RSA cryptosystems .The operation is time consuming for large operands. This paper describes the characteristics of three architectures designed to implement modular exponentiation using the fast binary method: the first field programmable gate array (FPGA) prototype has a sequential architecture, th...

متن کامل

Fast Classification with Binary Prototypes

In this work, we propose a new technique for fast k-nearest neighbor (k-NN) classification in which the original database is represented via a small set of learned binary prototypes. The training phase simultaneously learns a hash function which maps the data points to binary codes, and a set of representative binary prototypes. In the prediction phase, we first hash the query into a binary cod...

متن کامل

Scalable Image Annotation by Summarizing Training Samples into Labeled Prototypes

By increasing the number of images, it is essential to provide fast search methods and intelligent filtering of images. To handle images in large datasets, some relevant tags are assigned to each image to for describing its content. Automatic Image Annotation (AIA) aims to automatically assign a group of keywords to an image based on visual content of the image. AIA frameworks have two main sta...

متن کامل

The Design of a Nearest-Neighbor Classi er and Its Use for Japanese Character Recognition

The nearest neighbor (NN) approach is a powerful nonparametric technique for pattern classi cation tasks. Although the brute-force NN algorithm is simple and has high accuracy, its computation cost is usually very expensive, especially for applications such as Japanese character recognition in which the number of categories is large. Many methods have been proposed to improve the efciency of NN...

متن کامل

A modification of the LAESA algorithm for approximated k-NN classification

Nearest-neighbour (NN) and k-nearest-neighbours (k-NN) techniques are widely used in many pattern recognition classification tasks. The linear approximating and eliminating search algorithm (LAESA) is a fast NN algorithm which does not assume that the prototypes are defined in a vector space; it only makes use of some of the distance properties (mainly the triangle inequality) in order to avoid...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Symmetry

دوره 9  شماره 

صفحات  -

تاریخ انتشار 2017